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Abstract 

A mode of a multiset S is an element a G 5* of maximum multiplicity; that is, a occurs at least 
as frequently as any other element in S. Given a list A[l : n] of n items, we consider the problem of 
constructing a data structure that efficiently answers range mode queries on A. Each query consists of an 
input pair of indices for which a mode of A[i : j] must be returned. We present an 0(n^~^'')-space 
static data structure that supports range mode queries in O(n^) time in the worst case, for any fixed 
£ G [0, 1/2]. When e = 1/2, this corresponds to the first linear-space data structure to guarantee 0{^/n) 
query time. We then describe three additional linear-space data structures that provide 0{k), 0(m), 
and 0{\j — i\) query time, respectively, where k denotes the number of distinct elements in A and m 
denotes the frequency of the mode of A. Finally, we examine generalizing our data structures to higher 
dimensions. 

1 Introduction 

Mode and Range Queries. The frequency of an element a; in a multiset S, denoted freq5'(a;), is the 
number of occurrences (i.e., the multiplicity) of x va S. A mode of S is an element a G S such that for all 
X € S, freqg(a;) < beqg{a). A multiset S may have multiple distinct modes; the frequency of the modes of 
S, denoted by m, is unique. 

Along with the mean and median of a multiset, the mode is a fundamental statistic of data analysis for 
-which efficient computation is necessary. Given a sequence of n elements ordered in a list A, a range query 
seeks to compute the corresponding statistic on the multiset determined by a subinterval of the list: A[i : j]. 
The objective is to preprocess A to construct a data structure that supports efficient response to one or more 
subsequent range queries, -where the corresponding input parameters (i, j) are provided at query time. 

We assume the RAM model of computation -with -word size Q{\ogu), -where elements are drawn from a 
universe U = {0, . . . , u — 1}. Although the complete set of possible queries can be precomputed and stored 
using Q{n^) space, practical data structures require less storage while still enabling efficient response time. 
For alH, if i = j, then a range query must report A[i]. Consequently, any range query data structure for a list 
of n items requires fl{n) storage space in the worst case [?]• This leads to a natural question: how quickly can 
an 0(n)-space data structure answer range queries? The problem of constructing efficient data structures 
for range median queries has been analyzed extensively [TlllIinillllllllllllSllMllMllSnilSllMl- A range 
mean query is equivalent to a normalized range sum query (partial sum query), for which a precomputed 
prefix-sum array provides a linear-space static data structure with constant query time [30]. As expressed 
recently by Brodal et al. regarding the current status of the range mode query problem: "The problem of 
finding the most frequent element within a given array range is still rather open." [51 page 2] . See Section [2] 
for an overview of the current state of the range mode query problem. 

Our Results. Given an array ^[1 : n] of n items, we present an 0(n^^^' )-space static data structure that 
supports range mode queries in 0{n'^) time in the worst case, for any fixed e S [0,1/2]. When e = 1/2, 
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this corresponds to the first hnear-space data structure to guarantee 0{-s/n) query time. Prior to our work, 
the previous fastest linear-space data structure by Krizanc et al. [301 supported range mode queries in 
0{\/n\og\ogn) time; our data structure borrows ideas developed by Krizanc et al. and augments their data 
structure to eliminate dependence on predecessor queries (see Proposition |4| . We describe three additional 
0(ri)-space data structures that provide 0{k), 0{m), and 0(|j — 1|) query time, respectively, where k denotes 
the number of distinct elements in A. Finally we discuss generalizations of our data structures to d dimensions 
for any fixed d. To the authors' knowledge, this is the first examination of multidimensional range mode 
query. 

2 Related Work 

Computing a Mode. The mode of a multiset S oin items can be found in 0{n log n) time by sorting S and 
scanning the sorted list to identify the longest sequence of identical items. Due to the corresponding lower 
bound on the worst-case time for solving the element uniqueness problem, finding a mode requires Q,{n\ogn) 
time in the worst case; that is, the decision problem of determining whether m > 1 requires f2(nlog7i) time 
in the worst case |36| . Better bounds on the worst-case time are obtained by parameterizing in terms of m or 
k. A worst-case time of 0{n\ogk) is easily achieved by inserting the n elements into a balanced search tree 
in which each node stores a key and its frequency. Munro and Spira |32j describe an 0(nlog(n/m))-time 
algorithm for finding a mode and a corresponding lower bound of fl {n login /m)) on the worst-case time. 

If distinct elements in S can be mapped efficiently (i.e., in constant time) to distinct integers in the range 
{1, . . . , k'}, for some k' , then a mode of S can be found in 0{n + k') time using 0{n + k') space. This is 
achieved by identifying a maximum element in a frequency table for S of size k' . This method is analogous 
to counting sort. A similar algorithm for computing a mode can be implemented using hash tables. 

We include the following lemma to which we refer in Section [3] 

Lemma 1 (Krizanc et al. [30j) Let A and B be any multisets. If c is a mode of AU B and c ^ A, then 
c is a mode of B . 

Range Mode Query. Naturally, a mode of the query interval A[i : j] can be computed directly without 
preprocessing using any of the methods described in Section|2] Krizanc et al. [50] describe data structures that 
provide constant-time queries using 0(n^ log log n/ log n) space and 0(n'^ log n)-time queries using 0{n^~^'^) 
space, for any fixed e G (0,1/2]. Petersen and Grabowski [M] improve the first bound to constant time 
and 0(ri^ log log n/log^ n) space and Petersen improves the second bound to 0(n'^)-time queries using 
0{n'^~^'^) space, for any fixed e S [0, 1/2). When e = 1/2, the data structure of Krizanc et al. [30] requires only 
linear space and provides 0{y/n\oglogn) query time. Although its space requirement is almost linear in n 
as e approaches 1/2, the data structure of Petersen [33] requires u}{n) space. Furthermore, the construction 
becomes impractical as e approaches 1/2 (the number of levels in a hierarchical set of tables and hash 
functions approaches oo as e — > 1/2) and no obvious modification reduces its space requirement to 0{n). 
Greve et al. [25] prove a lower bound of f2(logn/log(s • w/n)) query time for any data structure that uses s 
memory cells of w bits. 

Bose et al. [7] consider approximate range mode queries, in which the objective is to return an element 
whose frequency is at least a ■ m. They give a data structure that requires 0(n/(l — a)) space and answers 
approximate range mode queries in 0(loglog]^/^ n) time for any fixed a € (0, 1), as well as data structures 
that provide constant-time queries for a & {1/2, 1/3, 1/4}, using space O(nlogn), O(nloglogn), and 0{n), 
respectively. Greve et al. [25] give a linear-space data structure that supports approximate range mode queries 
in constant time for a = 1/3, and an 0{n ■ a/{l — a))-space data structure that supports approximate range 
mode queries in 0(log(Q!/(l — a))) time for any fixed a £ [1/2, 1). 

Continuous Space versus Array Input. A vast literature studies the problems of geometric range 
searching in continuous Euclidean space; that is, data points are positioned arbitrarily in M.'^. See the survey 
by Agarwal [T] for an overview of results. The range query problems considered in this paper, however, 
restrict attention to array input. Although a range query on an array can be viewed as a restricted case of 
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a more general range searching problem (e.g., a point set with regular spacing), the algorithmic techniques 
differ greatly between the two settings when d > 2. When d — 1, however, a geometric range mode query 
problem reduces to array range mode query. In particular, the rank of each data point in Euclidean space 
corresponds to its array index. It suffices to compute the ranks of the respective successor and predecessor 
of the endpoints of the query interval to identify the indices i and j, and to return the corresponding array 
range mode query on A[i : j]. 

In addition to results on the median, mode, and sum range query problems discussed in Sections [l] 
and[2j other range query problems examined on arrays include semigroups pi 1381 139]. extrema (e.g., range 
minimum or maximum) [4j |6l [131 El EOl (HI EU l22], selection or quantiles (for which the median is a 
special case) [531 [Ml HSl [23] ) dominance or rank (counting the number of elements in the query range that 
exceed a given input threshold) [571 [2H], coloured range (counting/enumerating the distinct elements in the 
query range) |23j , and fc-frequency (determining whether any element has frequency k) |25| . Recently, range 
query problems have been examined on multidimensional arrays, including partial sums [12j . range minimum 
[31l8l[l3l|35l[40], median |24|, and selection [23 . 

3 Sparse Mode Table Method: 0{n') Query Time and 0{n^~^') 
Space 

In the worst case, for every range mode query processed, the data structure of Krizanc et al. |30| makes a se- 
quence of 8(rt'^) predecessor queries, each requiring 8 (log log n) time, for a total query timeof 0(n^ loglogn). 
We build on the data structure of Krizanc et al. and introduce a different technique that avoids predecessor 
search entirely. Section [3] establishes the following theorem and the corresponding corollary that follows 
when e = 1/2: 

Theorem 2 Given an array A[l : n] of n items, for any e e [0, 1/2] there exists a data structure requiring 
0(n'^~'^^) storage space that supports range mode queries on A in 0{n'^) time in the worst case. 

Corollary 3 Given an array A[l : n] of n items, there exists a data structure requiring 0{n) storage space 
that supports range mode queries on A in 0{\/n) time in the worst case. 

Data Structure Precomputation. Suppose the elements of A[l : n] are drawn from an ordered bounded 
universe U. Let D = {ai, . . . ,0^} C U denote the set of distinct elements stored in A. Construct an array 
B[l : n] such that for each i, B[i] stores the rank of A[i] in D. Therefore, B[i] e {1, . . . , k}. For any a, i, 
and j, B[a] is a mode of -B[i : j] if and only if A[a\ is a mode of A[i : j]. Performing computation on array B 
instead of array A allows direct array referencing using the values stored in B as indices. For simplicity, we 
describe our data structures in terms of array B; a table look-up provides a direct bijective mapping from 
{1, . . . , fc} to -D. Set D, array B, and the value k are independent of any query range and can be computed 
in O(nlogfc) time during preprocessing. 

Given fixed a and b, array C[l : fc] is a frequency table for B[a : b] if, for each i, C[i] stores the number of 
occurrences of element i in B[a : b]. For any j > i, if Ci[l : fc] is a frequency table for B[l : i] and Cj[l : fc] 
is a frequency table for B[l : j], then for each x, Cj[x] — C,;[x] is the frequency of B[x] in B[i + 1 : j]. 

For each a S {1, . . . , fc}, let Qa = {b \ B[b] = a}. That is, Qa is the set of indices b such that B[b] = a. 
For any a, a range counting query for element a in _B[i : j] can be answered by searching for the predecessors 
of i and j, respectively, in the set Qa] the difference of the indices of the two predecessors is the frequency of 
a in B[i : j] |30j . As noted above, implementing such a range counting query using an efficient predecessor 
data structure requires 8 (log log n) time in the worst case. 

The following related decision problem, however, can be answered in constant time by a linear-space data 
structure: does B[i : j] contain at least q instances of element B[i]7 This question can be answered by a 
select query that returns the index of the qth. instance of B[i] in B[i : n]. For each a G {1, . . . , fc}, store the 
set Qa as an ordered array (also denoted Qa for simplicity). Define a rank array B'[l : n] such that for all b, 
B'[b\ denotes the rank of B[b\ in B[l : n] (i.e., the index of b in Q_b[6])- Given any q, i, and j, to determine 
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Figure 1: Example of the sparse mode table method data structure. The number of list items is 
n = 24, of which fc = 5 are distinct. If e = 3/8, the array is partitioned into t — \n/s~\ — 6 blocks of size 
s = [n*^] = 4. The query range is A[i : j] = A\l : 19], for which the unique mode is 20, occurring with 
frequency 5. The corresponding mode of B[i : j] is 2. The query range B[7 : 19] is partitioned into the prefix 
B[7 : 8], the span B[9 : 16], and the suffix B[17 : 19]. The span covers blocks &j = 2 to bj — 3, for which the 
corresponding mode is S[2, 3] = 2, occurring with frequency S"[2, 3] = 4. 

whether B[i : j] contains at least q instances of B[i] it suffices to check whether Qsii] [B'[i] +q—l] < j. Since 
array QB[i] stores the sequence of indices of instances of element B[i] in B, looking ahead q — 1 positions in 
returns the index of the gth occurrence of element B[i] in B[i : n]; if this index is at most j, then the 
frequency of B[i] in B[i : j] is at least q. If the index B'[i] + q — 1 exceeds the size of the array QB[i], then 
the query returns a negative answer. This gives the following lemma: 

Lemma 4 Given an array A[l : n] of n items, there exists a data structure requiring 0{n) storage space 
that can determine in constant time for any {i,j} C {1, ... ,71} and any q whether A[i : j] contains at least 
q instances of element A[i]. 

Following Krizanc et al. [30], given any e £ [0, 1/2] we partition array B into t blocks of size s = \n'^~\, 
where t = [n/s] < [ri"'^"'^] . That is, for each i G {0, . . . ,t — 2}, the ith block spans B[i ■ s + 1 : {i + l)s] and 
the last block spans B[{t ~ 1) ■ s + 1 : n]. We precompute tables S[0 : t — 1,0 : t — 1] and S"[0 : t — 1, : t — 1], 
each of size 6(i^), such that for any {bi, bj} C {0, . . . , t — 1}, S[bi, bj] stores a mode of B[bi ■ s + 1 : {bj + l)s] 
and S'[bi,bj] stores the corresponding frequency. 

Finally, we need a frequency table C[l : k] of size k, initialized to zero. The arrays Qi, . . . , Qk can be 
constructed in 0(n) total time in a single scan of array B. The arrays S and S' can be constructed in 
0{n^~'^) time by scanning array B t times, computing one row of each array S and S' per scan. Thus, the 
total precomputation time required to initialize the data structure is 0{n^~'^). 

Range Mode Query Algorithm. Given a query range B[i : j], let bi — \{i — l)/s] and bj = lj/s\ — 1 
denote the respective indices of the first and last blocks completely contained within B[i : j]. We refer 
to B[bi ■ s + 1 : (bj + l)s] as the span of the query range, to B[i : min{6i • s,j}] as its prefix, and to 
_B[max{(6j + l)s + l,i} : j] as its suffix. One or more of the prefix, span, and suffix may be empty; in 
particular, if bi > bj , then the span is empty. See the example in Figure [l] 

The value c = S[bi, bj] is a mode of the span with corresponding frequency fc — S'[bi, bj]. If the span is 
empty, then let fc = 0. By Lemma ITj either c is a mode of B[i : j] or some element of the prefix or suffix 



4 



is a mode of B[i : j]. Thus, to identify a mode oi B[i : j], we verify for every element in the prefix and 
suffix whether its frequency in B[i : j] exceeds fc and, if so, we identify this element as a candidate mode 
and count its additional occurrences in B[i : j]. We present the details of this procedure for the prefix; an 
analogous procedure is applied to the suffix. 

We now describe how to compute the frequency of all candidate elements in the prefix over the range 
B[i : j], storing these values in the frequency table C. Sequentially scan the items in the prefix starting 
at the leftmost index, i, and let x denote the index of current item. If C[B[a;]] > 0, then an instance of 
element B[x] appears in B[i : x — I], and its frequency has been counted already; in this case, simply skip 
B[x] and increment x. If C[i3[a;]] — 0, check whether (5s[a;] [^'[a:] + /c — 1] < J (i-e., verify whether B[x] is a 
candidate). If so, then the frequency of B[x] in B[i : j] is at least f^- The exact frequency of B[x] in B[i : j] 
can be counted by a linear scan of Qb[x]: starting at index B'[x] + /c — 1 and terminating upon reaching 
either an index y such that QBlx][y] > j or the end of array Qb[x] (i-e., y — \Qb[x] \ + !)■ That is, 
denotes the index of the first instance of element B[x] that lies beyond the query range B[i : j] (or no such 
element exists). Consequently, the frequency of B[x] in B[i : j] is y — B'[x]. Store this value in C[i3[a;]]. 

An analogous procedure is repeated for the suHix. Upon completing the scans of the prefix and suffix, 
we identify a maximum value in array C; its index corresponds to a mode oi B[i : j]. Only non-zero entries 
in C need be examined (and subsequently reset to zero); this is achieved by making a second scan of the 
prefix and suffix and examining the corresponding elements in array C. 

Storage Space and Query Time. If the prefix and suffix are empty, then S[bi,bj] is a mode of B[i : j], 
and this value is returned in constant time. Without loss of generality, suppose the prefix contains at least 
one item. Consider an arbitrary index x & {i, . . . ,bi ■ s — 1} during the scan of the prefix. If C[i?[a;]] > 0, 
then B[x] is processed in constant time. Therefore, suppose C[i3[a;]] — 0. That is, x corresponds to the 
index of the first instance of B[x] in the prefix. Consequently, the frequency of B[x] in : j], denoted f^, 
is equal to its frequency in _B[a; : j]. By Lemma |4j determining whether fx > fc requires only constant time. 
Any item B[x] that is not a candidate is processed in constant time. Therefore, suppose B[x] is a candidate. 
Since the prefix and suffix each have size at most s — 1, /c < /a; < 2(s — 1). 

Item B[x] incurs a cost oiO{fx—fc) time for its first occurrence, and 0{1) time for subsequent occurrences. 
Since fc is the frequency of the mode of the span, at least f^ — fc instances of B[x] must occur in the prefix 
or suffix. In other words, instances of element B[x] incur a total cost of 0{cx) time, where denotes the 
frequency of B[x] in the prefix and suffix. Since the number of items in the prefix and suffix is at most 
2(s — 1), the total cost for processing the prefix is 0{s). By an analogous argument, the total cost for 
processing the suffix is also 0{s). Identifying the maximum element in array C and re-initializing C to zero 
requires 0(s) time. Therefore, a range mode query requires 0{s) — 0{n'^) time in the worst case. The 
data structure requires 0{n) space to store the arrays A, B, and B' , 0{n) total space to store the arrays 
Qi, . . . ,Qk, and 0{t^) — 0{n^~^'^) space to store the tables S and S' . This gives 0{n^~^'^) total space for 
0{n'^) worst-case query time for any e G [0, 1/2], proving Theorem [2] As mentioned earlier, fl{n) space is 
required. Therefore, increasing e beyond 1/2 increases query time without decreasing space. 

4 Additional Linear-Space Range Mode Query Data Structures 

We apply results from Section [S] to obtain three additional 0(n)-space data structures, giving the following 
theorem: 

Theorem 5 Given an array A[l : n] of n items, there exists a data structure requiring 0{n) storage space 
that supports range mode queries on any A[i : j] in 0{mi.n{^/n, fc, \j — i\,m + loglogn}) time in the worst 
case, where k denotes the number of distinct elements in A and m denotes the frequency of the mode of A. 

4.1 Sparse Frequency Table Method: 0{k) Query Time and 0{n) Space 

We now describe an 0{k + s) query time and 0{n + n ■ fc/s)-space data structure for any fixed s G [1,«]. 
When s G 6(fc), our data structure requires 0{n) space and supports range mode queries in 0{k) time. A 
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Figure 2: Example of the sparse frequency table method data structure. The number of list items 
is n ~ 16, of which fc = 5 are distinct. The array is partitioned into four blocks of size s — A. The query 
range is A[i : j] = A[6 : 15], for which elements 10 and 20 are modes, each occurring with frequency 3. The 
corresponding modes of B[i : j] are 1 and 2. Thus, C[l] = C[2] = 3 is the maximum value in the frequency 
array C. 



value of s € o(fc) (respectively, s € uj{k)) results in a;(n) space (i^(fc) time) without any reduction in query 
time (space). 

Data Structure Precomputation. For each p G {1, . . . ,n} such that p mod s = 0, construct a frequency 
table Cp[l : fc] for the range B[l : p]. Create one additional array Co[l : k], initialized to zero. There are 
[n/s] +1 such arrays Ci. See Figure [2j The preprocessing time required is 0(n+n-fc/s) (or 0(rt log fc+rt-fc/s) 
time if fc or B must be computed). 

Range Mode Query Algorithm. Array B is partitioned into blocks of size s as in Section [3] Given a 
query range B[i : j], we refer to the sequence of blocks completely covered hy B[i : j] as the span, and to 
the remaining subarrays as the prefix and suffix, respectively. A query on B[i : j] is performed as follows: 

1. Let p — sl{i — l)/sj and let p' ~ s[j/s\. That is, p is the largest p < i — 1 such that array Cp is 
defined. Similarly, p' is the largest p' < j such that array Cp' is defined. 

2. Create an array C[l : fc] such that for each x, C[x] -s— Cp'[2;] — C'p[a;]. Upon completing this step, C is 
a frequency table for the span B[p + 1 : p']. 

3. For each a; G {p + 1, . . . , i - 1}, set C[B[a;]] ^ C[B[x]] - 1. For each x e {p' + 1,.. . , j}, set C[B[x]] 4- 
C[-B[a;]] + 1. Upon completing this step C is a frequency table for the entire query range B[i : j]. 

4. Find a maximum value in C. If x' is an index that maximizes C[a;'], then B[x'] is a mode of B[i : j]. 

Storage Space and Query Time. The data structure consists of arrays A and B, requiring 0{n) space, and 
0\n/s~\ + 1 frequency tables of size fc. Thus, the total space required by the data structure is 0{n + n- k/s). 
Steps 1 through 4 of the algorithm require 0(1), 0(fc), 0{s), and 0(fc) time, respectively. This gives 
0{n + n ■ k/s) total space for 0(fc + s) query time. 

4.2 Low Frequency Mode Method: 0(m + loglogn) Query Time and 0{n) Space 

Using a combination of ideas from Section [3] and from an approximate range mode query data structure of 
Greve et al. [25] . we briefiy describe a range mode data structure parameterized in terms of the frequency 
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of the mode, m, with good bounds on space and query time when m is small (e.g., m g 0{y/n)). 

As in Section [sj the rank array B' and the arrays Qi, . . . , Qk are constructed, and array B is partitioned 
into blocks of size s. For each i g {0, . . . ,n} such that i mod s = 0, construct an array Fi[l : m] such that for 
each X, Fi[x] stores the largest j < n such that the mode of B[i : j] has frequency at most x; a corresponding 
mode is also stored. A query range B[i : j] is divided into prefix, span, and suffix subarrays as before. Let 
p — s\i/s] denote the index of the first element of the span. Using the technique of Greve et al. |2S], a 
mode of the span and its frequency are computed by finding the successor of j in Ff, this can be achieved 
in O(loglogn) time by an 0(n)-space data structure (e.g., a van Emde Boas tree [iSl [171 IE] or a y-fast 
trie |37j ) . By Lemma |4j determining whether the frequency of an element in the prefix or suffix exceeds 
that of the mode of the span requires only constant time per element, or 0{s) total time. The resulting 
worst-case query time is 0(s + log log n) using 0{n + n ■ m/s) space. Choosing s S Q{m) gives 0{n) space 
and 0{m + log log n) query time. 



4.3 Counting Method: 0{\j — i\) Query Time and 0{n) Space 

We briefly describe an 0{\j — i|)-time and 0(n)-space data structure. No actual precomputation is necessary 
other than constructing the array B, finding k, and initializing a frequency table C[l : k] to zero, all of which 
can be achieved in 0(n log fc) precomputation time. This algorithm is similar to counting sort: compute a 
frequency table for B[i : j] stored in C[l : fc], then identify a maximum element in C[l : fc]. When computing 
the maximum, the running time is bounded to 0{\j — i\) by only examining indices in C that correspond 
to elements in B[i : j] (these are exactly the elements of C that have non-zero values). This procedure is 
repeated after identifying the maximum to reset C'[l : fc] to zero. Each step requires 0(|j — i\) time and the 
total space required by the data structure is 0{n). 



5 Higher Dimensions 

A natural question is whether our results for one-dimensional range mode query extend to arbitrary dimen- 
sions. The array i3[l : n] is replaced by a d-dimensional array B[l : ni, . . . , 1 : n^;], containing n elements in 
total with dimensionality rii, . . . , n^j, where n = ni x • • • x n^. Within Section [5] we refer to a d-dimensional 
tuple (e.g., i = [ii, . . . , i^]) as an array index (e.g., We say a tuple i dominates another tuple j if and 

only if it < jt for alH G {1, . . . , d}. We denote the input array as B[l : n], where n = [ni, . . . , rid]- A range 
is defined over a d-dimensional rectangle of indices, uniquely determined by two indices, [i : j], where i < j. 

A key element of our one-dimensional data structures is the use of frequency tables. In d dimensions, 
array C[l : fc] is a frequency table for B[d : b] if, for each i e {1, . . . , fc}, C[i] stores the number of occurrences 
of element B[x] = i in B[d : b]. Unlike the one-dimensional case, if Cj[l : fc] is a frequency table for : 
and Cj[l : fc] is a frequency table for B[l : j], then Cj[i3[af]] — Cj[i3[a;]] is not the frequency of B[x] in B[i : j] 

in general. In one dimension, i dominates all indices that are to be excluded from the count, whereas this is 
not the case in higher dimensions. Instead, the 2'^ corners of the d-rectangle [i : j] can be used to compute 
the frequency table with typical inclusion-exclusion rules [131 • The result is computed using 2'^ d-directional 
range counting queries to determine the frequency of B[x] in B[a : b]. In the range searching literature it is 
typical to assume d to be a small known constant and for the corresponding factors of d to be omitted from 
the evaluation of space and time requirements. 

Counting Method. The counting method described in Section [43] does not depend on any properties of 
one-dimensional data and extends to c?-dimensional data and queries. The query time is directly proportional 
to the cardinality of the query range [i : j]: 0{Y[f^i{ji — ii + 1)). Precomputation time, query time, and 
space requirements are analogous to those of the one-dimensional data structure. 

Sparse Frequency Table Method. We now consider a generalization to d dimensions of the sparse 



frequency table method described in Section 4.1 As in the one-dimensional data structure, for every t e T 
we precompute a frequency table Cj-[1 : fc] for the range B[l : t], where T C [1 : n] is a fixed subset of indices. 
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If T is a sparse set whose elements are distributed regularly across [1, ft] , then a frequency table for the span 
can be computed in 0{2'^k) time and 0{n) space using the inclusion-exclusion principle. The remainder of 
the query algorithm consists of examining each index w in the enclosing set W — [i : j] \ [bi : bj] (known 
as the suffix and prefix in the one- dimensional case) and incrementing the corresponding frequency count 
C[-B[it;]]. Finally, the maximum value of the frequency table C determines the frequency of the mode; this 
maximum is identified in 0{k) time. Therefore, the total query time is 0{2'^k + \W\). 

The regular positioning of the indices in T forms a c?-dimensional grid that divides B[l : ft] evenly into 
|r| cells, each of which is a d-rectangie of cardinality s — n/\T\. Each frequency table has size k. In order for 
the space occupied by the frequency tables to remain linear there can be at most 0{n/k) such tables (e.g., 
let |r| — [n/fc] and s — k). We set the width of each cell in the I dimension to be 0{ni{k/n)d). Observe 
that nf=i '^K^/"-)^ — ^- Since there are s = k items in a cell, the number of items on the cell's surface 
perpendicular to dimension I is 

Observe that \W\ is at most s times the number of cells on the external surfaces of the c?-rectangle 
specified by the query range [i,j]. The total number of items on the external surface perpendicular to some 
dimension / € {1, . . . , d} is 0{n/ni). Thus the number of cells on that external surface is 



Therefore, \W\ € 0{d-k(n/k)^) = 0{d-n^ k^), resulting in a total query time of 0{2'^k + d- k^). If 
k is constant, then the query time can be improved to 0{2'^k) using 0{n ■ k) space by including a frequency 
table for every item in B. 

Sparse Mode Table Method. The sparse mode table method described in Section [3] and the sparse 
frequency table method both specify a subset T of indices positioned at regular intervals for which any pair 
determines a span within the array B. Instead of storing frequencies for all elements in D, however, the 
sparse mode table method stores a precomputed mode of the span between any two indices in T. The mode 
of the query range is then found by searching for elements in the prefix and suffix whose frequency exceeds 
that of the mode of the span. 

This data structure exemplifies the space-time trade-off. The 0{^/ri) query time and 0{n) space bounds 
of the one-dimensional data structure are possible because the cardinality of the prefix and suffix can be 
kept small while minimizing the time required to measure the frequency of elements in the prefix and suffix. 
In particular, the one-dimensional data structure supports a constant-time query to determine whether the 
frequency of a given element exceeds that of the mode of the span. This is achieved by referring to the arrays 
Qi, . . . ,Qk- These arrays, however, do not generalize easily to higher dimensions. A corresponding decision 
query would be: "Does element B[x] occur at least m times in the block B[i : j]?" Replacing the arrays 
Qit ■ ■ , Qk with orthogonal range counting data structures answers the query: "How frequently does element 
B[x] occur in the block B[i : j]?" A range counting query computed using kd-trees gives a linear-space data 
structure with 0(|(5[-B[a?']]|^~3) query time [3T]. Bentley and Mauer [S] describe a linear-space data structure 
with a faster query time of 0(|(5[i3[a;]]|'^) for any fixed e < 1, where the time and space bounds omit constant 
factors of e. 

As in Sectionjsj let W denote the enclosing set of indices, (i.e., the indices of the query range not contained 
in the span). Let Dw denote the set of distinct elements contained in W. Thus the range mode query time 

i£] 

Cl(max<j V \Q[u]\'^,\W\\\ ^0(rRax\n'^,\W\\) . (1) 



I E IQMl'^JW/ll J C0(max{n^,|iy|}) 



^Our data structure includes fed- trees. In the corresponding analysis of Lee and Wong [31| . d is assumed to be constant; 
consequently, constants dependent upon d do not appear in ([ij. 
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The arrays S and S" respectively store a mode and frequency of the span B[bi : bj] for all {bi,bj} C T. 
Maintaining linear space requires that 6(|T|) = 6(s) — Q{y/n). We set the number of elements per cell 
in the / dimension to be 0{y/ni). Thus the number of elements on the surface of the cell perpendicular to 
the I dimension is 0{^\fnjni). The total number of elements on the external surface perpendicular to some 
dimension I G {1, . . . , d} is Oiiiju]). Thus the number of cells on the external surface is 0{{n/ ni) ni / n) = 
0{\fnjn\). Therefore, 

If all values ni are equal, then ^ simphfies to 0{d ■ n^^A). 



6 Discussion and Directions for Future Research 



Generalizing Mode. The sparse frequency table and counting methods described in Sections 4.1 and 4.3 
respectively, can be generalized to return the xth most frequently occurring element in the query range 
A\i : j] for any x G {1, . . . , fc} by employing a linear-time (0(min{fc, \ j — time) selection algorithm to find 
the a;th largest element in the frequency table for A[i : j]. Due to its dependence on precomputed modes 
stored in array S, an analogous generalization seems unlikely without a significant increase in space for the 
sparse mode table method described in Section [Sj 

Open Problem 1 Given a list of A[l : n] ofn items, construct an 0{n) -space data structure for identifying 
the xth most frequently occurring element in the range A[i : j] with 0{^n) query time, where i, j, and x are 
provided at query time. 

Dynamic Range Mode Query. Prior discussion has been restricted to static data structures for range 
mode query. Dynamically updating the list of items is a natural operation: A[{\ -h- x. Unlike the range 
median query problem for which dynamic data structures exist [101 13 [Ml HH] , none of the previous data 
structures for range mode query [71 |30l ESJ |33l [34| support efficient updates. We briefly discuss some of the 
challenges of making our data structures dynamic. 

Both the sparse frequency table and counting methods described in Sections |4.1| and |4.3[ respectively, 
permit straightforward constant-time updates when the set of distinct elements, D, remains unchanged. 
Updates that modify D, however, require careful consideration. A key issue in defining dynamic data 
structures analogous to the static data structures described in this paper is to generalize the mapping 
defined by array B (see Section |3| to support efficient updates. We have preliminary results demonstrating 
that such updates are possible for implementing a dynamic version of the counting method. As the data 
structure for the sparse frequency method is currently specified, however, updates that modify D require 
Q{n) time in the worst case. The sparse mode table method described in Section [s] does not suggest itself as 
a good candidate for efficient updates. In particular, the table S requires 8(n) updates in the worst case, 
even if D remains unchanged. Also challenging is the problem of updating the arrays Qi, . . . , Qk- Each set 
Qx is stored as a sorted array to enable direct indexing, resulting in 0(n) update time in the worst case. 
Thus, the problem of defining an efficient dynamic range mode query data structure remains open. 

Open Problem 2 Given an array A[l : n] of n items, construct a dynamic data structure that supports 
efficient range mode queries and updates. 

Geometric Range Mode Query. The range mode problem has a natural definition in Euclidean space: 

Open Problem 3 Given a multiset P of n points in M'^, construct a data structure to support queries that 
return a mode of P O R for an arbitrary (orthogonal) query range R C . What is the time complexity of 
such a range query for a given space bound? 
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Interpreted differently, an instance of Problem [s] is a set of points P' C M*^, such that each point p £ P' 
is assigned a colour. In this case, the mode oi Rn P' is the most frequently occuring colour in the query 
region. As discussed in Section [2j when d — 1, this problem reduces to range mode query on an array. When 
d > 2, however, solution techniques tend to differ extensively for range searching problems set in continuous 
Euclidean space versus those restricted to array input. 

A range reporting query can be combined with a mode- finding algorithm (e.g., the counting method 



described in Section 4.3) to identify the multiset of points within the query range and then compute its 
mode. Such a solution requires enumerating all elements in the query range, possibly resulting in poor query 
time (e.g., when |i?nP| € 0(|P|)). A more ingenious solution might reduce query time by avoiding the use 
of a range report query. Other than a basic combination approach such as that described above, the range 
mode query problem in the continuous setting remains open. 

Lower Bounds. Recently, Greve et al. [2S] showed that any data structure that uses s memory cells of w 
bits requires f2(log n/ log(s • w/n)) time to answer a range mode query. For linear-space data structures in 
the RAM model, s -w & 6(nlogn), corresponding to a lower bound of ri(logn/loglogn) query time. Other 
than the bound of Greve et al. and the lower bounds on the problem of computing a mode of a multiset 
(see Section [2|, little is known regarding non-trivial lower bounds for the time complexity of the range mode 
query problem. In particular, it is unknown whether there exists a linear-space data structure that supports 
o{\/n) query time. 

Open Problem 4 Identify a function f{n) such that any 0{n)-space data structure that supports range 
mode query on an array ofn items requires fl{f{n)) query time in the worst case, where f{n) G uilogn/ log log n) 
or provide an 0{n)-space data structure that supports O {log n / loglog n) -time queries. 

The corresponding question for range selection query was recently solved by J0rgensen and Larsen |29j 
who showed a lower bound of r2(logr/loglogn) and a linear-space data structure with 0(logr/ loglogn -|- 
loglogn) query time, where r denotes the rank of the selection query. 
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